Low light image enhancement is the process of improving the quality of images taken in low light conditions.
Flicker artifacts, arising from unstable illumination and row-wise exposure inconsistencies, pose a significant challenge in short-exposure photography, severely degrading image quality. Unlike typical artifacts, e.g., noise and low-light, flicker is a structured degradation with specific spatial-temporal patterns, which are not accounted for in current generic restoration frameworks, leading to suboptimal flicker suppression and ghosting artifacts. In this work, we reveal that flicker artifacts exhibit two intrinsic characteristics, periodicity and directionality, and propose Flickerformer, a transformer-based architecture that effectively removes flicker without introducing ghosting. Specifically, Flickerformer comprises three key components: a phase-based fusion module (PFM), an autocorrelation feed-forward network (AFFN), and a wavelet-based directional attention module (WDAM). Based on the periodicity, PFM performs inter-frame phase correlation to adaptively aggregate burst features, while AFFN exploits intra-frame structural regularities through autocorrelation, jointly enhancing the network's ability to perceive spatially recurring patterns. Moreover, motivated by the directionality of flicker artifacts, WDAM leverages high-frequency variations in the wavelet domain to guide the restoration of low-frequency dark regions, yielding precise localization of flicker artifacts. Extensive experiments demonstrate that Flickerformer outperforms state-of-the-art approaches in both quantitative metrics and visual quality. The source code is available at https://github.com/qulishen/Flickerformer.
Low-light image enhancement aims to restore the visibility of images captured by visual sensors in dim environments by addressing their inherent signal degradations, such as luminance attenuation and structural corruption. Although numerous algorithms attempt to improve image quality, existing methods often cause a severe loss of intrinsic signal priors. To overcome these challenges, we propose a Dual-Stream Transformer Network (DST-Net) based on illumination-agnostic signal prior guidance and multi-scale spatial convolutions. First, to address the loss of critical signal features under low-light conditions, we design a feature extraction module. This module integrates Difference of Gaussians (DoG), LAB color space transformations, and VGG-16 for texture extraction, utilizing decoupled illumination-agnostic features as signal priors to continuously guide the enhancement process. Second, we construct a dual-stream interaction architecture. By employing a cross-modal attention mechanism, the network leverages the extracted priors to dynamically rectify the deteriorated signal representation of the enhanced image, ultimately achieving iterative enhancement through differentiable curve estimation. Furthermore, to overcome the inability of existing methods to preserve fine structures and textures, we propose a Multi-Scale Spatial Fusion Block (MSFB) featuring pseudo-3D and 3D gradient operator convolutions. This module integrates explicit gradient operators to recover high-frequency edges while capturing inter-channel spatial correlations via multi-scale spatial convolutions. Extensive evaluations and ablation studies demonstrate that DST-Net achieves superior performance in subjective visual quality and objective metrics. Specifically, our method achieves a PSNR of 25.64 dB on the LOL dataset. Subsequent validation on the LSRW dataset further confirms its robust cross-scene generalization.
Retinex theory provides a principled foundation for low-light image enhancement, inspiring numerous learning-based methods that integrate its principles. However, existing methods exhibits limitations in accurately decomposing reflectance and illumination components. To address this, we propose a Retinex-Guided Transformer~(RGT) model, which is a two-stage model consisting of decomposition and enhancement phases. First, we propose a latent space decomposition strategy to separate reflectance and illumination components. By incorporating the log transformation and 1-pixel offset, we convert the intrinsically multiplicative relationship into an additive formulation, enhancing decomposition stability and precision. Subsequently, we construct a U-shaped component refiner incorporating the proposed guidance fusion transformer block. The component refiner refines reflectance component to preserve texture details and optimize illumination distribution, effectively transforming low-light inputs to normal-light counterparts. Experimental evaluations across four benchmark datasets validate that our method achieves competitive performance in low-light enhancement and a more stable training process.
The low-light conditions are challenging to the vision-centric perception systems for autonomous driving in the dark environment. In this paper, we propose a new benchmark dataset (named DarkDriving) to investigate the low-light enhancement for autonomous driving. The existing real-world low-light enhancement benchmark datasets can be collected by controlling various exposures only in small-ranges and static scenes. The dark images of the current nighttime driving datasets do not have the precisely aligned daytime counterparts. The extreme difficulty to collect a real-world day and night aligned dataset in the dynamic driving scenes significantly limited the research in this area. With a proposed automatic day-night Trajectory Tracking based Pose Matching (TTPM) method in a large real-world closed driving test field (area: 69 acres), we collected the first real-world day and night aligned dataset for autonomous driving in the dark environment. The DarkDriving dataset has 9,538 day and night image pairs precisely aligned in location and spatial contents, whose alignment error is in just several centimeters. For each pair, we also manually label the object 2D bounding boxes. DarkDriving introduces four perception related tasks, including low-light enhancement, generalized low-light enhancement, and low-light enhancement for 2D detection and 3D detection of autonomous driving in the dark environment. The experimental results show that our DarkDriving dataset provides a comprehensive benchmark for evaluating low-light enhancement for autonomous driving and it can also be generalized to enhance dark images and promote detection in some other low-light driving environment, such as nuScenes.
Low-light image enhancement is challenging due to entangled degradations, mainly including poor illumination, color shifts, and texture interference. Existing methods often rely on complex architectures to address these issues jointly but may overfit simple physical constraints, leading to global distortions. This work proposes a novel anchor-then-polish (ATP) framework to fundamentally decouple global energy alignment from local detail refinement. First, macro anchoring is customized to (greatly) stabilize luminance distribution and correct color by learning a scene-adaptive projection matrix with merely 12 degrees of freedom, revealing that a simple linear operator can effectively align global energy. The macro anchoring then reduces the task to micro polishing, which further refines details in the wavelet domain and chrominance space under matrix guidance. A constrained luminance update strategy is designed to ensure global consistency while directing the network to concentrate on fine-grained polishing. Extensive experiments on multiple benchmarks show that our method achieves state-of-the-art performance, producing visually natural and quantitatively superior low-light enhancements.
Limited illumination often causes severe physical noise and detail degradation in images. Existing Low-Light Image Enhancement (LLIE) methods frequently treat the enhancement process as a blind black-box mapping, overlooking the physical noise transformation during imaging, leading to suboptimal performance. To address this, we propose a novel LLIE approach, conceptually formulated as a physics-based attack and display-adaptive defense paradigm. Specifically, on the attack side, we establish a physics-based Degradation Synthesis (PDS) pipeline. Unlike standard data augmentation, PDS explicitly models Image Signal Processor (ISP) inversion to the RAW domain, injects physically plausible photon and read noise, and re-projects the data to the sRGB domain. This generates high-fidelity training pairs with explicitly parameterized degradation vectors, effectively simulating realistic attacks on clean signals. On the defense side, we construct a dual-layer fortified system. A noise predictor estimates degradation parameters from the input sRGB image. These estimates guide a degradation-aware Mixture of Experts (DA-MoE), which dynamically routes features to experts specialized in handling specific noise intensities. Furthermore, we introduce an Adaptive Metric Defense (AMD) mechanism, dynamically calibrating the feature embedding space based on noise severity, ensuring robust representation learning under severe degradation. Extensive experiments demonstrate that our approach offers significant plug-and-play performance enhancement for existing benchmark LLIE methods, effectively suppressing real-world noise while preserving structural fidelity. The sourced code is available at https://github.com/bywlzts/Attack-defense-llie.
This letter presents a novel training approach and loss function for learning low-light image enhancement auto-encoders. Our approach revolves around the use of a teacher-student auto-encoder setup coupled to a progressive learning approach where multi-scale information from clean image decoder feature maps is distilled into each layer of the student decoder in a mirrored fashion using a newly-proposed loss function termed Illumination-Aware Mirror Loss (IAML). IAML helps aligning the feature maps within the student decoder network with clean feature maps originating from the teacher side while taking into account the effect of lighting variations within the input images. Extensive benchmarking of our proposed approach on three popular low-light image enhancement datasets demonstrate that our model achieves state-of-the-art performance in terms of average SSIM, PSNR and LPIPS reconstruction accuracy metrics. Finally, ablation studies are performed to clearly demonstrate the effect of IAML on the image reconstruction accuracy.
Autonomous navigation in GPS-denied and visually degraded environments remains challenging for unmanned aerial vehicles (UAVs). To this end, we investigate the use of a monocular thermal camera as a standalone sensor on a UAV platform for real-time depth estimation and simultaneous localization and mapping (SLAM). To extract depth information from thermal images, we propose a novel pipeline employing a lightweight supervised network with recurrent blocks (RBs) integrated to capture temporal dependencies, enabling more robust predictions. The network combines lightweight convolutional backbones with a thermal refinement network (T-RefNet) to refine raw thermal inputs and enhance feature visibility. The refined thermal images and predicted depth maps are integrated into ORB-SLAM3, enabling thermal-only localization. Unlike previous methods, the network is trained on a custom non-radiometric dataset, obviating the need for high-cost radiometric thermal cameras. Experimental results on datasets and UAV flights demonstrate competitive depth accuracy and robust SLAM performance under low-light conditions. On the radiometric VIVID++ (indoor-dark) dataset, our method achieves an absolute relative error of approximately 0.06, compared to baselines exceeding 0.11. In our non-radiometric indoor set, baseline errors remain above 0.24, whereas our approach remains below 0.10. Thermal-only ORB-SLAM3 maintains a mean trajectory error under 0.4 m.
Most sRGB-based LLIE methods suffer from entangled luminance and color, while the HSV color space offers insufficient decoupling at the cost of introducing significant red and black noise artifacts. Recently, the HVI color space has been proposed to address these limitations by enhancing color fidelity through chrominance polarization and intensity compression. However, existing methods could suffer from channel-level inconsistency between luminance and chrominance, and misaligned color distribution may lead to unnatural enhancement results. To address these challenges, we propose the Variance-Driven Channel Recalibration for Robust Low-Light Enhancement (VCR), a novel framework for low-light image enhancement. VCR consists of two main components, including the Channel Adaptive Adjustment (CAA) module, which employs variance-guided feature filtering to enhance the model's focus on regions with high intensity and color distribution. And the Color Distribution Alignment (CDA) module, which enforces distribution alignment in the color feature space. These designs enhance perceptual quality under low-light conditions. Experimental results on several benchmark datasets demonstrate that the proposed method achieves state-of-the-art performance compared with existing methods.
Event cameras, with their high dynamic range, show great promise for Low-light Image Enhancement (LLIE). Existing works primarily focus on designing effective modal fusion strategies. However, a key challenge is the dual degradation from intrinsic background activity (BA) noise in events and low signal-to-noise ratio (SNR) in images, which causes severe noise coupling during modal fusion, creating a critical performance bottleneck. We therefore posit that precise event denoising is the prerequisite to unlocking the full potential of event-based fusion. To this end, we propose BiEvLight, a hierarchical and task-aware framework that collaboratively optimizes enhancement and denoising by exploiting their intrinsic interdependence. Specifically, BiEvLight exploits the strong gradient correlation between images and events to build a gradient-guided event denoising prior that alleviates insufficient denoising in heavily noisy regions. Moreover, instead of treating event denoising as a static pre-processing stage-which inevitably incurs a trade-off between over- and under-denoising and cannot adapt to the requirements of a specific enhancement objective-we recast it as a bilevel optimization problem constrained by the enhancement task. Through cross-task interaction, the upper-level denoising problem learns event representations tailored to the lower-level enhancement objective, thereby substantially improving overall enhancement quality. Extensive experiments on the Real-world noise Dataset SDE demonstrate that our method significantly outperforms state-of-the-art (SOTA) approaches, with average improvements of 1.30dB in PSNR, 2.03dB in PSNR* and 0.047 in SSIM, respectively. The code will be publicly available at https://github.com/iijjlk/BiEvlight.